Well-Defined Benchmarking Method For Second Generation Read Mapping”
نویسندگان
چکیده
We note that each match with distance ≤ k − 2 implies at least one match on both sides of it. Figure 1 in the main article shows an example. This can also be seen in Figure 2 in the main article. For k = 5, the third end position of the third lower branch in the left tree implies feasible matches left and right of it. This problem is partially solved by the definition of neighbour equivalence in Section 2.4 of the main article. However, there is a problem when merging matches in this way: For k = 4, the end position marked with ? (at the fourth lower branch of the left tree) separates the matches left and right of it. However, in Section 2.4 of the main article, we explained that alignments sharing their trace are basically the same. Thus, it is desirable to merge the matches left and right of such separating positions. This is the reasoning behind defining trace equivalence and combining it into ≡.
منابع مشابه
Next-generation sequencing algorithms: from read mapping to variant detection
Next-Generation-Sequencing (NGS) has brought on a revolution in sequence analysis with its broad spectrum of applications ranging from genome resequencing to transcriptomics or metagenomics, and from fundamental research to diagnostics. The tremendous amounts of data necessitate highly efficient computational analysis tools for the wide variety of NGS applications. This thesis addresses a broad...
متن کاملThe mapping task and its various applications in next-generation sequencing
The aim of this thesis is the development and benchmarking of computational methods for the analysis of high-throughput data from tiling arrays and next-generation sequencing. Tiling arrays have been a mainstay of genome-wide transcriptomics, e.g., in the identification of functional elements in the human genome. Due to limitations of existing methods for the data analysis of this data, a novel...
متن کاملNext generation sequencing data of a defined microbial mock community
Generating sequence data of a defined community composed of organisms with complete reference genomes is indispensable for the benchmarking of new genome sequence analysis methods, including assembly and binning tools. Moreover the validation of new sequencing library protocols and platforms to assess critical components such as sequencing errors and biases relies on such datasets. We here repo...
متن کاملMOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping
MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide cons...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011